Non-coding 886 (nc886/vtRNA2-1), the epigenetic odd duck – implications for future studies

ABSTRACT Non-coding 886 (nc886, vtRNA2–1) is the only human polymorphically imprinted gene, in which the methylation status is not determined by genetics. Existing literature regarding the establishment, stability and consequences of the methylation pattern, as well as the nature and function of the nc886 RNAs transcribed from the locus, are contradictory. For example, the methylation status of the locus has been reported to be stable through life and across somatic tissues, but also susceptible to environmental effects. The nature of the produced nc886 RNA(s) has been redefined multiple times, and in carcinogenesis, these RNAs have been reported to have conflicting roles. In addition, due to the bimodal methylation pattern of the nc886 locus, traditional genome-wide methylation analyses can lead to false-positive results, especially in smaller datasets. Herein, we aim to summarize the existing literature regarding nc886, discuss how the characteristics of nc886 give rise to contradictory results, as well as to reinterpret, reanalyse and, where possible, replicate the results presented in the current literature. We also introduce novel findings on how the distribution of the nc886 methylation pattern is associated with the geographical origins of the population and describe the methylation changes in a large variety of human tumours. Through the example of this one peculiar genetic locus and RNA, we aim to highlight issues in the analysis of DNA methylation and non-coding RNAs in general and offer our suggestions for what should be taken into consideration in future analyses.


Background
Non-coding 886 (nc886, HGNC symbol vtRNA2-1, previously also referred to as pre-miR-886, CBL3 and hvg-5) is the only known polymorphically imprinted gene in humans, the variation of which is not caused by genetic factors [1][2][3][4][5].In population cohorts of mainly European ancestry, 75% of individuals have been reported to present a maternally imprinted region in this locus (chr5:136078784-136080957, GRCh38 [2]), while approximately 25% present two non-methylated alleles [1,[6][7][8] (Figure 1a).A noncoding RNA is transcribed from the locus, but its nature has been discussed in the literature -i.e., whether the full-length RNA forms a hairpin structure and binds directly to proteins or is cleaved into miRNA-like short RNAs [9,10].The expression of this RNA is regulated by the methylation pattern in the gene locus [1,11,12], but also by the genetic variation downstream of the gene [1].
There are a number of studies showing how preconceptional or prenatal conditions associate with the methylation pattern of this gene [1,7,8,13,14], and both the methylation pattern and RNAs expressed from this locus have been linked to health traits [1,7] and morbidities [15,16], making it a candidate for a molecule mediating the Developmental Origins of Health and Disease (DOHaD) hypothesis [17].However, due to the bimodal DNA methylation pattern of the nc886 locus and the considerable and stable physiological variation in the nc866 RNA expression, conservative analysis methods may produce falsepositive or inconsistent results.Herein, we discuss why and how the characteristics of nc886 give rise to the contradictory results presented in current literature, in addition to providing new, reanalysed and replicated data.We pinpoint the unanswered questions regarding nc886 and emphasize the requirements for future studies regarding this genetic locus and the RNA(s) produced from it.

Discovery of nc886 RNA(s)
Non-coding 886 was first discovered as miR-886-3p and 886-5p by human short RNA sequencing [18].It was soon also identified in rhesus macaques [19] and (a) Distribution of the median methylation levels of cg07158503, cg11608150, cg06478886, cg04481923, cg18678645, cg06536614, cg25340688, cg26896946, cg00124993, cg08745965 and cg18797653, located in the DMR overlapping nc886 (GSE40279).Two clusters can be observed: individuals with a 50% methylation level (~75% of the population) and those with a methylation level close to 0% (~25% of the population).In the former, the maternal allele is methylated [6][7][8]42,43] and the paternal allele unmethylated and permissive for transcription, whereas in the latter cluster, both alleles are unmethylated and permissive for transcription.(b) Schematic presentation of the nc886 gene, nc886 DMR and the CTCF-binding sites flanking the DMR.The telomeric CTCF-binding site has been suggested to interact with another binding site near the IL9 gene, bringing a suggested enhancer region close to the nc886 gene [1,27].Made with BioRender.com.
included in miRbase version 10.For an RNA to be considered as a miRNA with high confidence, it should produce a mature product of approximately 22 nt in length, have a hairpin-structured precursor and be phylogenetically conserved, and the decrease in Dicer enzyme function should lead to increased precursor molecule levels [20].These requirements were met by the nc886 primary transcript, which is cleaved into two short RNAs.In 2009, however, nc886 RNA was shown to have significant sequence similarities with vault-RNAs [9,21] and co-sediment with intact vault particles [22], thus leading to the removal of miR-886 from miRbase (version 16).The RNA was suggested to be a novel vault RNA (vtRNA) [23] and renamed as vtRNA2-1, which remains the HGNC-accepted official name of the gene.Like the three previously known vault RNAs (vtRNA1-1, 1-2 and 1-3), nc886 is coded in the 5q31, has a suitable length for a vault RNA, and has been suggested to form similar secondary structures to those formed by the other members of the vault family [24].
Since then, the identity of the nc886 transcript both as a miRNA and as a vault RNA has been questioned.In 2011, Lee et al. recharacterized the nc886 RNA and concluded that the stem-loop of nc886 has qualities distinct from classical pre-miRNA and, unlike the majority of pre-miRNAs, its production is independent of Drosha.Dicer was also reported to cleave the stem-loop to two ~ 20nt RNAs with poor efficiency, and the levels of these short RNAs were shown to be so low that they could not be detected in the majority of the analysed samples [25].The nc886 RNA was also shown to be transcribed by RNA polymerase III (RNA pol III) [26], while the majority of human miRNAs are transcribed by RNA polymerase II.On the other hand, it was demonstrated that the sequence similarity to the previously known vtRNAs is mostly limited to regions relevant to transcription by Pol III, that the expression pattern of nc886 differs from other vtRNAs and that vault proteins and the nc886 RNA do not to co-localize [25].The Dicerdependent, but Drosha-independent, production of the small RNAs derived from this region was later verified by Miñones-Moyano et al. and Fort et al. [9,16].However, Fort et al. demonstrated that the produced small RNAs associate with Argonaute, satisfying the attributes of miRNA, and further hypothesized that the suggested recent evolutionary origin of the nc886 gene would be the reason for the poor efficiency of Dicer in producing the mature miRNAs [9,27].This work was again followed by the work of Lee et al., describing the issues arising from the classification of nc886 as a miRNA and further emphasizing their previous view of nc886 being distinct from pre-miRNAs [10].
Regardless of the classification of nc886, the gene has been shown to code for a 101nt-long noncoding RNA, transcribed by Pol III [26,28], which is then cleaved into two short RNAs (nc886-3p; 23nt, and nc886-5p; 24-25nt) with rather low efficiency [9,25].The full-length nc886 is highly expressed in at least cancer cell lines (10 5 copies per cell in HeLa cells), the transcripts are localized in the cytoplasm in an organized way [25,29], and the half-life of the transcripts has been reported to be short (71.5 min) [12,15,25].

Function of nc886 RNAs
In the literature, the function of the nc886 RNAs has been suggested to be mediated either by the short RNAs acting as miRNAs [1,9] or by a direct binding of the hairpin structure to protein kinase R (PKR) (nucleotides 46-59) [28,30,31,44] or to 2-5-oligoadenylate synthetase 1 (OAS1) [32].Lee et al. have suggested a tumour-surveillance model where the binding of the nc886 RNA prevents PKR from being activated and suppresses apoptosis [25,33].On the other hand, the short nc886 RNAs have been repeatedly predicted to regulate pathways linked with cancer development and insulin signalling [1,9].It has also been suggested that the effects of the nc886 RNA would be at least partly caused by its binding to RISC and the subsequent hindering of the processing of other miRNAs [29].The expression levels of either the full-length nc886 RNA or the short derivates have been associated with infection [23,[34][35][36][37], allergy [38], asthma [39] and skin senescence [40,41].Since most of these studies argue for the existence of nc886-derived small RNAs based on arrays or TaqMan probes, the identity of the quantified RNAs must be interpreted with caution, as discussed by Lee et al. [10].All in all, consequences of nc886 RNA expression have been scarcely studied outside of cancer (reviewed later), and many open questions remain about its function in physiological conditions.

Methylation status of the nc886 locus
The nc886 gene is located in the only known canonically polymorphically imprinted region, the methylation status of which is not associated with genetics in humans [1][2][3]5].In detail, there is an approximately 1,600-bp-long, differently methylated region (DMR), including the nc886 coding sequence, which is flanked by two CTCF binding sites that are hypothesized to insulate the DMR (Figure 1b) [2,7].Somatic diploid cells present each genomic site twice (maternal and paternal copies), and just as the methyl group in one DNA strand can be either present or lacking, the DNA methylation of a CpG site in one cell can be either 0%, 50% or 100%.When analysing samples with a mixture of cells, the DNA methylation of a given CpG site usually becomes a continuous variable, with values ranging from 0% to 100% (or from 0 to 1 in beta-values), as the sample is comprised of different proportions of cells with the previously mentioned DNA methylation statuses.However, the DMR overlapping the nc886 presents a mostly bimodal methylation pattern, where 75% of individuals have 50% methylation in the region (hereafter referred to as imprinted individuals) and approximately 25% of individuals have methylation levels close to 0% (hereafter referred to as non-methylated individuals), indicating two non-methylated alleles [1,[6][7][8] (Figure 1b).Studies on family units [42,43] and gametes [6][7][8]43] strongly suggest that it is the maternal allele that is methylated in individuals with a 50% methylation level in this locus.It must be highlighted that the bimodal methylation pattern poses challenges for both data processing [45] and analysis, as the majority of genome-wide analyses rely on linear regression methods and are thus based on the assumption that DNA methylation in a given site is a continuous, normally distributed variable.
Due to the parent-of-origin-dependent methylation pattern, nc886 is considered to be an imprinted gene.Canonically imprinted genes present a parentof-origin-dependent gene expression pattern, with either the maternal or paternal locus silenced via epigenetic mechanisms, including DNA methylation [46].Typically, imprinted genes present a methylation level of 50% in all somatic cells and tissues [47].The epigenetic profiles maintaining the imprint are established during gametogenesis, when the existing DNA methylation pattern is first erased, followed by the creation of the parent-oforigin-related DNA methylation pattern [48].This pattern is then retained throughout an individual's life.In addition, tissue-or developmental-stagespecific imprinting can be observed in, for example, the placenta [49,50].The significance of intact genetic imprints is highlighted by the severe disorders caused by imprinting defects [51].We have previously shown that individuals who present multilocus imprinting disturbances also present an altered methylation pattern at the nc886 locus [6], indicating that there are similarities in imprint establishment and/or maintenance of the nc886 imprint and the more typical non-polymorphically imprinted genes.
Similarly to canonically imprinted genes, individuals present the same methylation pattern of the nc886 DMR in the majority of their somatic tissues, regardless of the germ layer from which the tissue originates [6,8,13].We have previously reported that, among the 30 studied somatic tissues, only skeletal muscle and the cerebellum make an exception.In skeletal muscle, all individuals present an imprinted nc886 profile with a 50% methylation level, and, in the cerebellum, all individuals present a methylation level of approximately 75%, indicating biallelic methylation in some of the cells [6].Results from Olsen et al. indicate that the bimodal methylation pattern is also lost from granulosa cells and that the methylation of the nc886 locus in these specific cells might be associated with age [52].Upon analysing somatic tissues that were not included in the previous study, we now report that the methylation pattern of breast, testis and prostate tissues are also distinct from what can be observed in most somatic tissues, including blood (Supplementary Figure S1).Similarly to skeletal muscle, the bimodal methylation pattern cannot be observed in these tissues.
A small percentage of humans (1%-6%) present intermediate methylation levels (20%-40%) at the nc886 locus and are thus chimeric of nonmethylated and imprinted cells.We have shown that the intermediate methylation pattern in blood is not due to different proportions of cell types [1] and that it can also be observed across tissues [6].Like the non-methylated and imprinted status, the intermediate status is also stable through time [1,13].Furthermore, a few individuals (~0.1% of the population) present methylation levels of over 60% [1] (Supplementary Figure S2), indicating that the paternal allele has also gained methylation in some proportion of the cells, thus expressing very low levels of nc886 RNAs.This implies that at least certain cells can survive with very low nc886 expression.
nc886 is an evolutionarily young gene.It can only be found in primates, guinea pigs and some members of the squirrel family [7,27,35].In all of the apes analysed previously (n = 106), none presented a non-methylated epigenotype of nc886, indicating that the polymorphic imprinting of this locus is human-specific [27].The centromeric CTCF site, which is also evolutionarily young, is relevant for the existence of the methylation pattern in the region, as only primates with an intact binding sequence of CTCF present the nc886 imprint [27].

Methylation of the nc886 locus is associated with nc886 RNA levels
The regulation of gene expression through DNA methylation is a nuanced system.The bestdescribed mechanism is the association between repressed gene expression and methylated CpG islands overlapping the transcription start site of a gene [53].Methylation in the nc886 DMR has been shown to regulate nc886 RNA expression in several in vitro settings, including 5-Aza-2′deoxycytidine treatment [11,29,[54][55][56].We and Treppendahl et al. have also shown that the intrinsic DNA methylation pattern in blood is associated with differences in nc886 expression levels [1,11].In our data, the nc886-5p levels are increased twofold in non-methylated individuals (both alleles permissive for expression) in comparison to imprinted individuals (only one allele permissive for expression).Furthermore, individuals presenting intermediate DNA methylation levels in the nc886 epiallele also present intermediate levels of nc886 RNAs [1] (Figure 1).Work by Park et al. describes how the methylation in the region leads to the formation of heterochromatin and the region being unavailable for RNA Pol III.They also suggest that, in open chromatin formation, MYC binds to an E-box upstream of the nc886 gene and then interacts with RNA Pol III, enabling transcription [12].Fort et al. have also demonstrated a negative correlation between the nc886 promoter methylation and chromatin accessibility in normal and tumour tissues [37].In addition to this epigenetic regulation of nc886 expression, we have shown that genetic variation 100-200 kb downstream of the nc886 gene is associated with nc886 RNA levels [1].As both the genetic profile and the DNA methylation pattern are stable in most somatic tissues and throughout an individual's lifespan, there is already considerable physiological variation in nc866 RNA expression in healthy human population, which should be considered while investigating the levels of these RNAs in relation to morbidities.

Establishment of the nc886 imprint
Originally, Romanelli et al. suggested that the methylation pattern of nc886 arises 4-6 days into embryonic development [43].However, the results are based on cell lines derived from a limited number of individuals.A reanalysis of DNA methylation profiling data from the same cell lines (GSE52576) [57] shows that the parthenogenetically activated oocytes and embryonic stem cells contain both non-methylated and imprinted cell lines (Supplementary Figure S3), indicating that these cell lines could actually be reflecting the intrinsic nc886 methylation pattern of the oocyte.Immortalization and the creation of iPSC have been shown to affect the methylation pattern of imprinted genes, including nc886, which should be taken into consideration when interpreting results from in vitro studies [6,12].Our results with identical twins separated between days 1 and 3 [6] after fertilization and the analysis of oocyte data by Carpenter et al. [8] also suggest that the methylation pattern is already established in the oocyte.Although DNA methylation profiling data is currently available from oocytes in many studies, the coverage of the nc886 DMR in bisulphite sequencing data is generally so low that no definitive conclusion can be drawn.To some extent, this also includes the data of Okae et al. [58] upon which the conclusions by Carpenter et al. [8] and Jima et al. https://jb2.humanicr.org/[59] are based.
Although conclusive data are lacking, current results are pointing in the direction of the nc886 methylation pattern being established in the oocyte, in line with non-polymorphic maternally imprinted genes [60].In mice, these imprints are established asynchronously during the growth and maturation of the oocyte in prophase II [61].As the nc886 polymorphic imprint is unique to humans, the establishment of the methylation pattern is difficult to study because the growth phase of the oocyte lasts from the early weeks after the birth of the woman to the formation of the secondary follicle during menstruation, which can span up to 50 years.The establishment of the intermediate and overmethylated patterns, on the other hand, likely occurs during the de-and re-methylation of the embryonic genome, as identical twins separated before implantation can present methylation differences of up to 17% [6].This also fits the timing of the de-and re-methylation of the embryonic genome [62] (Figure 2).Thus, we suggest that there are at least two distinct mechanisms contributing to the nc886 methylation patterns observed in humans: the initial establishment of the pattern during oocyte maturation and the subsequent maintenance of this pattern during embryonic development.
One of the burning questions relating to nc886 is what leads 25% of studied individuals to present the non-methylated epigenotype.Unlike with other polymorphically imprinted genes, targeted and genome-wide genetic analyses have failed to discover a genetic cause for the pattern [1,3,8,11,13].Based on 30,347 studied individuals from 32 datasets, we previously reported that in singletons of European ancestry, the proportion of imprinted individuals varies by less than 5%, ranging from 72.6% to 77.5%.When considering other ethnicities or populations consisting of twins, more variation could be observed [6].In our previous study, populations of African descent had a higher proportion of imprinted individuals (79.1% to 78.7%), whereas Asian populations included fewer imprinted individuals (68.2% to 65.8%), although the differences were not drastic and the number of populations of other than European ancestry was limited [6].A smaller percentage of imprinted individuals (65.9%, n = 82) in an Asian population is also described by You et al. [63].
To provide more information on the topic, we investigated DNA methylation data from data repositories with a focus on populations of other than European ancestry and were able to replicate our previous findings (Figure 3).When inspecting DNA methylation data from populations of African ancestry, the highest percentage of imprinted individuals observed was 88% in ǂKhomani San, albeit the sample population is limited in size (GSE99029, n = 57) [64].The pattern is similar among American populations of African descent [65], as well as other African populations [66].On the other hand, the percentage of imprinted individuals is lower in Asian populations, and the lowest percentage of individuals with an imprinted nc886 was 55%, which was observed in populations from the Indonesian archipelago [67] (Figure 3).It should be noted that, within populations, differences can be seen between specific living locations.Therefore, we cannot rule out the possibility that local genetic or environmental factors could affect the percentage of individuals with an imprinted nc886 locus, but we also note that the observed differences could be explained by the small numbers of individuals in the specific subpopulations (Supplementary Figure S4).As the establishment of the methylation pattern has not been associated with genetic variation [1][2][3]5], it is perplexing that systematic variation in the percentage of imprinted individuals was observed across populations.These results again highlight the need for more EWAS and mQTL analyses to be performed in non-European populations [68].
The prevalence of non-methylated individuals has been associated with maternal age [1,7,69], the season of conception [7,13], maternal nutrition [13] or folic acid supplementation [70], family socioeconomic status [1] and maternal alcohol consumption [8].Of these, only the association between maternal age and a higher prevalence of non-methylated children has been shown in more than one cohort [7,13].We further aimed to replicate the association between the season of conception and the prevalence of imprinted individuals.In the dataset in which this was first reported, the analysis was performed with nc886 methylation as a continuous variable [13].Later, Carpenter et al. [7] analysed the data as a categorical variable, but removed the intermediately methylated individuals as having an 'inconclusive' methylation pattern.We again reanalysed the data by dividing individuals into imprinted and others to avoid losing individuals in a cohort with a limited size (n = 120).While the finding that links nc886 methylation to the season of conception remains nominally significant, it is not replicated in a dataset with a similar study setting (GSE99863) (Figure 4a).All in all, more research with larger datasets is needed to understand if, and especially how, preconceptional or prenatal maternal traits associate with the nc886 methylation profile.It remains to be established whether these traits change the methylation status of the oocyte or embryo, or whether a non-methylated nc886 region is beneficial for the success of fertilization or the survival of the foetus in certain conditions.
If the proportions of nc886 methylation status groups in a population are affected by mechanisms related to the establishment and maintenance of the imprint, but also to the chances of a live birth in certain conditions, it would explain how the nc886 methylation pattern in populations could be affected by perinatal or periconceptional conditions while being simultaneously already established in the maturing oocyte.Furthermore, the combined contribution of three distinct mechanisms (establishment, maintenance, and survival bias) might mask the potential genetic component in the pattern, thus explaining the differences between populations with different geological origins while simultaneously explaining the negative results observed in genetic analyses.

Adulthood traits affecting nc886 methylation status
The nc886 methylation levels in blood have been shown to be stable for up to 25 years of follow-up in adulthood [1] and from childhood to adolescence [13].In contrast to these findings, a multitude of studies have reported, in cross-sectional settings, that methylation status in this locus is affected by outside exposures after birth or in adulthood, including post-natal famine [14], adulthood exposure to pesticides or gases/fumes [71], miner dust [72], smoking [73], wildfire-related fine particulate matter [74], and long-term aircraft and railway noise [75].One explanation for these conflicting findings is that the qualities of nc886 methylation have not been taken into account and, in these analyses, the methylation in the locus has been Percentages of imprinted (in colour) and non-and intermediately methylated individuals in the nc886 locus in population cohorts with ancestral origins in Africa (green), Europe (blue) and Asia (orange).The percentage of imprinted individuals in populations with ancestral origins in Africa is higher, and in populations originating from Asia lower, than the 75% previously reported populations of European ancestry1.Populations marked with a star are from the same sample series (GSE36369), and these populations are thus free of technical or sample collection bias when compared to each other.Data processing and thresholds for imprinted individuals are presented in the supplementary materials and methods and in Supplementary Figure S2.
treated as a continuous variable, which can lead to false-positive findings.We have now reanalysed the available dataset related to post-natal famine [14] by categorizing individuals based on their nc886 methylation status.No difference was seen in the prevalence of imprinted individuals between those exposed to famine either pre-or postnatally and controls, when reanalysing the data as categorical (Figure 4b).It should be noted that we were only able to analyse the whole data, not the selected subset utilized in the work by Finer et al. [14].
Unfortunately, other datasets in which the abovementioned associations were discovered are not publicly available and thus could not be reanalysed with nc886 methylation treated as a categorical variable.Furthermore, some of the findings are based on a very limited number of individuals, and in the case of prenatal famine, for example, similar studies have not been able to replicate the associations [76][77][78] (Figure 4).Therefore, a reanalysis of the datasets and replication are needed to rule out falsepositive findings.GSE5959213) associating lower levels of nc886 methylation to the season of conception remains statistically significant even after clustering individuals into imprinted and non-methylated or intermediately methylated (p = 0.049), but this finding is not replicated in the other Gambian population cohort available.The significant association between postnatal exposure to famine (b) and Parkinson's disease (c) disappears when the nc886 methylation pattern is treated as a categorical variable and is not replicated in datasets with similar or related study settings.Data processing and thresholds for imprinted individuals are presented in supplementary materials and methods and in Supplementary Figure S2.

nc886 and later-life health traits
When treated as a categorical variable, the nc886 methylation pattern has been associated with metabolic traits.We have shown that, in comparison to imprinted individuals, non-methylated individuals have higher insulin and glucose levels in childhood and adolescence, non-methylated boys have higher HDL and non-HDL cholesterol levels in childhood, and non-methylated small children present with higher adiposity [1].In line with this, van Dijk et al. showed that nonmethylated children were at an increased risk of obesity, a trend which they replicated in a dataset consisting of 355 healthy young individuals (GSE73103) [79].The maternal imprinted nc886 locus has also been reported to be associated with an increased risk of preterm birth (n = 82) [63], but this interesting discovery still needs to be replicated in a larger dataset.
Many publications report associations between nc886 methylation status and the risk of morbidities, while treating the methylation in this region as a continuous variable.Lower methylation of nc886 has been associated with an increased risk of orofacial clefts [80], IgA nephropathy [56] and Parkinson's disease [81].When reanalysing the nc886 methylation pattern as a categorical variable, it is clear that, in the discovery cohort of IgA nephropathy (GSE72364, n = 12 [81]), four out of the six controls present the non-methylated epigenotype and all of the six affected individuals are imprinted, generating the Δβ > 0.3 (Supplementary Figure S2).With such a small population, the probability of having one group consisting of only imprinted individuals is 18%, and nominally significant results would be obtained by having 3 out of 6 individuals be non-methylated in the other group.Similarly, in the data of Henderson et al. (GSE165083 [81]), the noticeable but statistically non-significant difference in the nc886 methylation status groups between Parkinson's disease cases and controls (imprinted vs others in the Chi-squared test p = 0.09, n = 28) is generated by 8 vs 12 imprinted individuals in each group (Supplementary Figure S2).Furthermore, no difference can be observed in the nc886 methylation patterns of Parkinson's patients and controls in GSE145361 [82], with 1,889 individuals, or in GSE111629, with 572 individuals [6,83] (Figure 4c).Similar issues can be seen when reporting the associations of nc886 RNA expression and phenotypes in small sample settings.For example, Miñones-Moyano et al. report elevated nc886-5p levels in the brains of Parkinson's disease patients in comparison to controls (n < 40 per setting).A clustered expression pattern can be observed in the amygdala, the frontal cortex and the substantia nigra of Parkinson's patients, with patients in the motor stages of the disease (Figure 1 in Miñones-Moyano et al. [16]).As the nc886 RNA levels are strongly regulated by the methylation pattern, their results could be caused by the uneven distribution of imprinted and non-methylated individuals among cases and controls.In support of this, the aforementioned connection to Parkinson's disease cannot be detected in the cerebellum, where the bimodal methylation pattern cannot be observed [6,16].Similarly, it would be interesting to see whether the associations between nc886 RNA levels and IgA nephropathy would remain significant after taking into account the stable regulators of the RNA levels [84].A reanalysis with the methylation and genetic regulators included in the model would be warranted to confirm these results.

nc886 and cancer
Changes in nc886 methylation level and RNA levels are widely reported in cancer.nc886 RNAs have been found to be upregulated in cervical cancer [85][86][87], breast cancer [86,88], high-grade bladder cancer [89], high-grade prostate cancer [90,91], multiple myeloma [92], endometrial cancer [32] and renal carcinoma [93], while downregulation has been reported in cholangiocarcinoma [30], oesophageal cancer [54,94], prostate cancer [95,96], ovarian cancer [97], breast cancer [98], thyroid cancer [99,100], small-cell lung cancer [101], squamous cell lung carcinoma [102] and oral squamous cell carcinoma [103].Notably, in breast and prostate cancers, both directions are reported.Several molecular mechanisms could explain the changes in nc886 transcription in cancer.The expression of genes transcribed by RNA pol III, such as nc886, has been reported to be upregulated in cancer.The transcription of nc886 is also activated by transcription factor MYC, the levels of which are often also upregulated in cancerous cells [12], theoretically leading to even higher expression rates of nc886 RNAs.On the other hand, hypermethylation of the nc886 locus has been reported in several types of cancer [43,55], leading to the formation of heterochromatin and the repression of the transcription [12].
Knocking out or down nc886 RNA expression has been shown to both induce apoptosis or suppress proliferation [25,28,44,[85][86][87][88]91,93,104] and to promote cell division [105].Similarly, overexpression of these RNAs has been shown to have both growth-promoting [85,86,91,93,106] and growth-restricting [9,54,95,99,103,[107][108][109] properties.Elevated levels of the nc886 RNAs or lower levels of DNA methylation in the region have also been linked to both a poorer [29,91,101,[110][111][112] and a better [11,54,96,105,107,108,113,114] prognosis of the malignant disease.The issues of investigating the roles of alleged miRNAs in cell cultures with over-expression or knock-out has been described in detail by Lee et al., emphasizing the selection of suitable methods to produce a natural nc886 molecule in physiological concentrations, in addition to correct methods in evaluating the success of knock-down [10].In the case of nc886, the stable methylation pattern and the genetic variation in the enhancer region lead to substantial intrinsic differences in nc886 RNA levels between individuals [1].These differences should be taken into account when studying associations between nc886 RNA levels and end points in small sample sets or in limited numbers of cell lines, as uneven distribution of the stable regulatory profiles in cases and controls may lead to false-positive findings.
Changes in the methylation pattern of the nc886 locus have also been linked to cancer [37,43,55,115].However, the majority of these studies do not take into account the bimodal distribution of normal tissue or that the changes in the DNA methylation pattern lead into different consequences in relation to the original methylation status.If the methylation status increases by 20%, it will lead to very different results depending on whether the inherent methylation level was 0% or 50%.In studies where the original methylation pattern has been considered, both hyper-and hypomethylation have been reported [43,55,115], with both also reported to have predicted poorer survival [37,115].Also, the innate DNA methylation status of the nc886 locus has been shown to predict the future risk of prostate [4,5], breast [5,42] and pancreatic cancer [116], although the study by Wang et al. treats nc886 methylation as a continuous variable and reports rather inconsistent results [116].These studies suggest that individuals with non-methylated nc886 might be at an increased risk of developing cancer, which would, at least in theory, be in line with the suggested growth-promoting role of maternally imprinted genes, such as nc886, during foetal development [117].
Ignoring the special qualities of nc886 methylation can also hide biologically relevant results.As an example, when comparing the methylation levels in the region from clear cell renal cell carcinoma (GSE61441) and adjacent normal tissue, a statistically non-significant (p = 0.06) hypomethylation can be detected in the tumours.When comparing the withinindividual differences between normal and tumour tissue, 65% (30/46) of the tumours show higher than 2% hypomethylation, while 13% (6/46) present hypermethylation of over 2%, suggesting a pattern of hypomethylation but also stark differences between individual tumours.Intriguingly, hypomethylation can also be seen in individuals presenting low methylation levels in healthy tissue (Figure 5).
Similarly to Fort et al. [37], we retrieved data from The Cancer Genome Atlas Program (TCGA) to investigate the nc886 methylation pattern in cancerous tissue, but unlike in previous literature, we investigated the methylation patterns instead of comparing mean methylation in the nc886 promoter region.In line with previous literature, we discovered that the changes in methylation are cancer-type-specific [37].While the methylation pattern in some cancers, such as acute myeloid leukaemia, remains mostly unaltered, in others, such as malignant melanoma, the bimodal methylation pattern observed in healthy skin is completely lost (Figure 5).In the majority of cancer types, some degree of hypermethylation and loss of control of the methylation pattern can be observed (Figure 5).In comparison to the bimodal pattern in blood, more systematic hypermethylation can be observed in brain cancers, with the exception of gliomas, as well as in prostate and breast cancers.Hypomethylation in comparison to healthy tissue can be observed in renal carcinomas.Both healthy and cancerous testes present lower methylation and from corresponding healthy tissues (in grey).(a) In some cancer types, the original bimodal methylation pattern is faithfully or near-faithfully maintained (acute myeloid leukaemia and hepatocellular carcinoma), while in others, the pattern is completely lost (malignant melanoma).In tumours from the cerebrum (excluding glioblastoma), a systematic hypermethylation can be observed.
levels than blood, in addition to an absence of the bimodal methylation pattern.Similarly, in both healthy breast and prostate tissue, the bimodal methylation pattern is not present, as the majority of individuals present methylation levels of roughly 50%, and hypomethylation can be seen in tumours of these tissues (Figure 5).
It will be interesting to study whether it is the methylation level itself, the degree of hypo-or hypermethylation, or the loss of the established pattern that is associated with cancerous development and future prognosis.One can hypothesize that the loss of imprint in this region could be a sign of epigenetic instability and thus associated with a poorer prognosis, regardless of the direction of change in the methylation levels.However, the original methylation pattern of the individual and tissues of origin, as well as the proportions of the methylation status groups in a population, should always be taken into account when interpreting the obtained results.For example, the findings from a cross-sectional study suggesting that the human papilloma virus is associated with similar changes in the nc886 methylation profile in both cancerous and healthy tissue could also be explained by different proportions of imprinted individuals in the case and control groups at baseline [104].Due to the peculiar nature of nc886 methylation, longitudinal analyses would be exceptionally beneficial to determine its role in cancer progression.

Conclusions
nc886 has been shown to be associated with both periconceptional conditions and adulthood health traits, but many of these results are not replicated in independent cohorts.Results obtained from small cohorts, in particular, warrant reanalysis and replication.Current knowledge indicates that the DNA methylation pattern in the nc866 locus is established during oocyte maturation [6,8], remains stable during life and in the majority of somatic tissues, but is associated with at least maternal age at birth [1,7].In this case, it should be contemplated whether nc886 can be considered to mediate the DOHaD hypothesis, as the methylation status as such is not affected by environmental conditions.However, if the methylation pattern in a population is shaped by pregnancy success, nc886 could mediate the genetics-independent adaptation of the population to the surrounding conditions.
Approximately 75% of individuals of European ancestry present an imprinted nc886 locus, while populations of African descent present a higher percentage of imprinted individuals and Asian populations have systematically lower numbers of imprinted individuals.Genetic and epigenetic analyses of populations with more diverse backgrounds are needed to understand this pattern.In previous works, nc886 methylation has been associated with metabolic traits, and immortalization and carcinogenesis have been shown to alter the inherent DNA methylation pattern [12].We demonstrate herein that the changes in the methylation pattern are cancer-type-and tissue-oforigin-specific.However, the specific role of nc886 RNAs in the development of malignancy remains unclear.Similarly, the true nature of nc886 RNAs is still under debate.As the hairpin structure that has been suggested to bind PKR is formed by nucleotides 46-59 and the short RNAs are suggested to be processed from nucleotides 1-24 and 80-101, both hypotheses could potentially be correct.
Great inherent variation caused by the stable genetic and epigenetic regulators in the nc886 RNA expression within populations can lead to incorrect interpretations, when assuming that all differences between cases and controls are due to the studied condition.Furthermore, the binomial methylation pattern of nc886 warrants post-hoc analyses every (b) the bimodal methylation pattern of nc886 is already missing in the healthy breast, prostate and testis samples, with breast and prostate cancer demonstrating hypermethylation, whereas the pattern in testicular tumours corresponds to the hypomethylated unimodal pattern also observed in the healthy tissue.(c) interestingly, renal carcinomas are observed to present hypomethylation in comparison to healthy tissue, while some malignant rhabdoid tumours of the kidneys are hypermethylated.(d) when analysing median nc886 methylation from the healthy adjacent tissue and the tumour in clear cell adenocarcinoma (GSE61441)135, a nonsignificant minor hypomethylation can be observed in the tumours, but an analysis of changes at the individual level reveals a more systematic hypomethylation pattern.The start of the arrow represents the methylation level of the healthy tissue and the end of the arrow the methylation level of the tumour, with each arrow representing one individual.Data processing and thresholds for imprinted individuals are presented in supplementary materials and methods and in Supplementary Figure S5.
time the region is discovered in an EWAS analysis.These notions can be generalized to study DNA methylation and the expression of gene products regulated via DNA methylation.Genetic variation and other features, such as sex, can contribute to methylation levels that are categorical rather than continuous variables.Especially in studies with small datasets, it would be important to inspect the distribution of identified methylation sites.Furthermore, even among continuous features, there is potentially physiological variation in the levels of the measured epigenetic profile that are not caused by the condition in question.Even though nc886 codes for only a few peculiar RNAs and is located in an atypical locus, the contradicting results presented herein highlight the fact that, in this era of genome-wide bioinformatic analyses and vast amounts of data, researchers should take time to study their top findings further in order to avoid reducing science to a mere reporting of statistically significant values.

Implications for future studies
(1) The DNA methylation pattern in the nc886 locus should be treated as a binomial variable to avoid reporting false-positive findings due to the random distribution of non-methylated and imprinted individuals among cases and controls.
(2) While studying the DNA methylation pattern in cancer, hypo-and hypermethylation of nc886 should be reported only when the intrinsic DNA methylation pattern has been taken into account in the analysis.(3) When investigating the associations between nc886 RNA levels and phenotypes, the stable genetic and epigenetic regulation pattern of the individuals or cell lines should be taken into account to avoid false-positive results caused by the uneven distribution of the stable regulatory profiles of nc886 RNAs in small case -control settings.

Open questions relating to nc886
(

Figure 1 .
Figure 1.(a) Distribution of the median methylation levels of cg07158503, cg11608150, cg06478886, cg04481923, cg18678645, cg06536614, cg25340688, cg26896946, cg00124993, cg08745965 and cg18797653, located in the DMR overlapping nc886 (GSE40279).Two clusters can be observed: individuals with a 50% methylation level (~75% of the population) and those with a methylation level close to 0% (~25% of the population).In the former, the maternal allele is methylated[6][7][8]42,43] and the paternal allele unmethylated and permissive for transcription, whereas in the latter cluster, both alleles are unmethylated and permissive for transcription.(b) Schematic presentation of the nc886 gene, nc886 DMR and the CTCF-binding sites flanking the DMR.The telomeric CTCF-binding site has been suggested to interact with another binding site near the IL9 gene, bringing a suggested enhancer region close to the nc886 gene[1,27].Made with BioRender.com.

Figure 2 .
Figure 2. Establishment of the nc886 methylation pattern.The methylation pattern of the non-methylated or imprinted nc886 locus is suggested to be established during the maturation of the oocyte.We further hypothesize that the intermediate methylation pattern is caused by the sporadic loss of methylation during the global de-methylation of the embryonic genome or the gain of methylation during the re-methylation.After implantation, the methylation pattern, as well as rough portions of non-methylated and imprinted cells in intermediately methylated individuals, remain unchanged in the majority of somatic tissues.Made with BioRender.com.

Figure 3 .
Figure3.Percentages of imprinted (in colour) and non-and intermediately methylated individuals in the nc886 locus in population cohorts with ancestral origins in Africa (green), Europe (blue) and Asia (orange).The percentage of imprinted individuals in populations with ancestral origins in Africa is higher, and in populations originating from Asia lower, than the 75% previously reported populations of European ancestry1.Populations marked with a star are from the same sample series (GSE36369), and these populations are thus free of technical or sample collection bias when compared to each other.Data processing and thresholds for imprinted individuals are presented in the supplementary materials and methods and in Supplementary FigureS2.

Figure 4 .
Figure 4. Reanalysing and attempting to replicate findings linking the nc886 methylation pattern to (a) the season of conception [13] (GSE59592 and GSE99863), (b) Postnatal (10.5061/dryad.k67kf)and periconceptional (GSE116379) exposure to famine14, and (c) Parkinson's disease (GSE165081, GSE145361 and GSE111629).(a) The original discovery by Silver et al. (GSE5959213) associating lower levels of nc886 methylation to the season of conception remains statistically significant even after clustering individuals into imprinted and non-methylated or intermediately methylated (p = 0.049), but this finding is not replicated in the other Gambian population cohort available.The significant association between postnatal exposure to famine (b) and Parkinson's disease (c) disappears when the nc886 methylation pattern is treated as a categorical variable and is not replicated in datasets with similar or related study settings.Data processing and thresholds for imprinted individuals are presented in supplementary materials and methods and in Supplementary FigureS2.

Figure 5 .
Figure 5. (a-c) The median methylation level of nc886 in tumour samples (in black) from the cancer genome atlas program (TCGA) and from corresponding healthy tissues (in grey).(a) In some cancer types, the original bimodal methylation pattern is faithfully or near-faithfully maintained (acute myeloid leukaemia and hepatocellular carcinoma), while in others, the pattern is completely lost (malignant melanoma).In tumours from the cerebrum (excluding glioblastoma), a systematic hypermethylation can be observed.
1) How and when is the nc886 DMR established, and do the preconceptional/prenatal conditions modulate the DNA methylation pattern?Does the non-methylated methylation pattern, for example, provide a survival advantage to the foetus in non-optimal pregnancy conditions?What causes the distinct patterns of the nc886 methylation status in cohorts of different geographical origins?How are the intermediate and overmethylated nc886 methylation patterns established?(2) Does the functional form of nc886 RNAs constitute the 101nt-long hairpin structure binding to proteins, the two short RNAs produced by Dicer and acting in a miRNAlike manner, or both?What is the function of nc886 RNA(s) in physiological conditions?(3) Does nc886 have a causal role in carcinogenesis, or are the changes in DNA methylation pattern and RNA expression consequences of epigenetic instability?Laboratoriolääketieteen edistämissäätiö sr., the Pirkanmaa Regional Fund of the Finnish Cultural Foundation, Signe och Ane Gyllenbergs stiftelse, state funding for universitylevel health research, Tampere University Hospital, the Wellbeing Services County of Pirkanmaa [9AC077, 9X047, 9S054, 9AB059 and T63074], the Yrjö Jahnsson Foundation [grants 20207299 and 20197212], and the Finnish Foundation for Cardiovascular Research.